Very Fast Identi cation of tRNA in Genomic DNA
نویسندگان
چکیده
The identication of functional regions in genomic DNA increasingly relies on the coupling of experimental work to computer processing of sequences. Newly sequenced fragments of a genome may be analysed with computer programs and the result of an automated search may guide a new set of experiments. In such a context, this paper focuses on the identication of tRNA sequences. The role of tRNA in protein synthesis is of key importance. Over the last 20 years, the tRNA molecule has been extensively studied. The corresponding gene is a short sequence which folds in the form of a clover leaf. A large amount of sequences are available and aligned [1]. Conserved regions appear in the alignment which consists in only 76 positions. There are basically two approaches to the identication of tRNA genes. It is either part of a general purpose method designed for searching and/or folding RNA sequences [2, 3, 4] or a self-contained method tailor made for searching tRNA genes such as [5, 6]. As one can expect, reported results are usually more accurate in the latter case. Whatever the approach, priority is rarely given to how quickly a search is performed. Nevertheless , within years, complete genomes of various organisms will be available and fast sequence scanning is already becoming a concern. The rst really reliable algorithm, tRNAscan [5] is based on the use of "weight" or "con-sensus" matrices which make the denition of the RNA motifs more exible and is often part of a general search strategy [7].The algorithm depends on two essential characteristics of the primary and secondary structure of the tRNA gene: (1) the presence of invariant (i.e. universal) and semi-invariant nucleotides located in two highly conserved regions, (2) the clover leaf structure consisting in four arms (paired bases) and three loops (unpaired bases), one of which being of variable size. Such an approach was pushed further to improve both the exibility and the speed of the algorithm. To address the question of exibility, in particular in dening arms, each of the possible ten pairs (regardless of the orientation) is given a weight. These values re BLOCKINect the
منابع مشابه
SECURING INTERPRETABILITY OF FUZZY MODELS FOR MODELING NONLINEAR MIMO SYSTEMS USING A HYBRID OF EVOLUTIONARY ALGORITHMS
In this study, a Multi-Objective Genetic Algorithm (MOGA) is utilized to extract interpretable and compact fuzzy rule bases for modeling nonlinear Multi-input Multi-output (MIMO) systems. In the process of non- linear system identi cation, structure selection, parameter estimation, model performance and model validation are important objectives. Furthermore, se- curing low-level and high-level ...
متن کاملAn Editing Environment for DNA Sequence Analysis
This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identi cation program, which predicts the most \probable" gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the us...
متن کاملAn efficient and simple CTAB based method for total genomic DNA isolation from low amounts of aquatic plants leaves with a high level of secondary metabolites
An efficient DNA isolation protocol specifically modified to get pure quality DNA required for molecular studieshas been reported in this paper. Some aquatic plants (Potamogeton spp., Ceratophyllum demersum and Myriophyllum spicatum) were used for the study. The protocol developed will be useful in getting high and pure DNA. Instead of using the available DNA extraction kits, this protocol can ...
متن کاملComputational Limits on Team Identi cation of Languages
A team of learning machines is essentially a multiset of learning machines A team is said to successfully identify a concept just in case each member of some nonempty subset of the team identi es the concept Team identi cation of programs for computable functions from their graphs has been investigated by Smith Pitt showed that this notion is essentially equivalent to function identi cation by ...
متن کاملComputational Limits on Team Identi cation
A team of learning machines is essentially a multiset of learning machines. A team is said to successfully identify a concept just in case each member of some nonempty subset of the team identi es the concept. Team identi cation of programs for computable functions from their graphs has been investigated by Smith. Pitt showed that this notion is essentially equivalent to function identi cation ...
متن کامل